Expected Sequence Similarity Maximization

نویسندگان

  • Cyril Allauzen
  • Shankar Kumar
  • Wolfgang Macherey
  • Mehryar Mohri
  • Michael Riley
چکیده

This paper presents efficient algorithms for expected similarity maximization, which coincides with minimum Bayes decoding for a similarity-based loss function. Our algorithms are designed for similarity functions that are sequence kernels in a general class of positive definite symmetric kernels. We discuss both a general algorithm and a more efficient algorithm applicable in a common unambiguous scenario. We also describe the application of our algorithms to machine translation and report the results of experiments with several translation data sets which demonstrate a substantial speed-up. In particular, our results show a speed-up by two orders of magnitude with respect to the original method of Tromble et al. (2008) and by a factor of 3 or more even with respect to an approximate algorithm specifically designed for that task. These results open the path for the exploration of more appropriate or optimal kernels for the specific tasks considered.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Relation between weight matrix and substitution matrix: motif search by similarity

MOTIVATION The discovery of patterns shared by several sequences that differ greatly is a basic task in sequence analysis, and still a challenge. Several methods have been developed for detecting patterns. Methods commonly used for motif search include the Gibbs sampler, Expectation-Maximization (EM) algorithm and some intuitive greedy approaches. One cannot guarantee the optimality of the resu...

متن کامل

Testable implications of subjective expected utility theory

I show that the predictive content of the hypothesis of subjective expected utility maximization critically depends on what the analyst knows about the details of the problem a particular decision maker faces. When the analyst does not know anything about the agent’s payoffs or beliefs and can only observe the sequence of actions taken by the decision maker any arbitrary sequence of actions can...

متن کامل

Flexible Reward Plans to Elicit Truthful Predictions in Crowdsourcing

We develop a flexible reward plan to elicit truthful predictive probability distribution over a set of uncertain events from workers. In our reward plan, the principal can assign rewards for incorrect predictions according to her similarity between events. In the spherical proper scoring rule, a worker’s expected utility is represented as the inner product of her truthful predictive probability...

متن کامل

Analysis of protein sequence/structure similarity relationships.

Current analyses of protein sequence/structure relationships have focused on expected similarity relationships for structurally similar proteins. To survey and explore the basis of these relationships, we present a general sequence/structure map that covers all combinations of similarity/dissimilarity relationships and provide novel energetic analyses of these relationships. To aid our analysis...

متن کامل

CMfinder - a covariance model based RNA motif finding algorithm

MOTIVATION The recent discoveries of large numbers of non-coding RNAs and computational advances in genome-scale RNA search create a need for tools for automatic, high quality identification and characterization of conserved RNA motifs that can be readily used for database search. Previous tools fall short of this goal. RESULTS CMfinder is a new tool to predict RNA motifs in unaligned sequenc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010